QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models
نویسندگان
چکیده
In this paper, we develop a family of algorithms for optimizing “superpositionstructured” or “dirty” statistical estimators for high-dimensional problems involving the minimization of the sum of a smooth loss function with a hybrid regularization. Most of the current approaches are first-order methods, including proximal gradient or Alternating Direction Method of Multipliers (ADMM). We propose a new family of second-order methods where we approximate the loss function using quadratic approximation. The superposition structured regularizer then leads to a subproblem that can be efficiently solved by alternating minimization. We propose a general active subspace selection approach to speed up the solver by utilizing the low-dimensional structure given by the regularizers, and provide convergence guarantees for our algorithm. Empirically, we show that our approach is more than 10 times faster than state-of-the-art first-order approaches for the latent variable graphical model selection problems and multi-task learning problems when there is more than one regularizer. For these problems, our approach appears to be the first algorithm that can extend active subspace ideas to multiple regularizers.
منابع مشابه
تحلیل منحنی زیست محیطی کوزنتس با استفاده از فرایند کیفیت زیست محیطی مشمول انتخاب سبد مصرفی خانوار
Environmental Kuznets Curve ( EKC) theory has evolved over several decades from its initial intuitive conception to the complex theoretical models of today. Through successive steps of empirical and theoretical debate, a quadratic relationship between income and environmental degradation has been proposed, criticized, defended, and criticized again. Along the way, each finding have new look...
متن کاملActiveClean: Interactive Data Cleaning While Learning Convex Loss Models
Data cleaning is often an important step to ensure that predictive models, such as regression and classification, are not affected by systematic errors such as inconsistent, out-of-date, or outlier data. Identifying dirty data is often a manual and iterative process, and can be challenging on large datasets. However, many data cleaning workflows can introduce subtle biases into the training pro...
متن کاملDirty Statistical Models
We provide a unified framework for the high-dimensional analysis of“superposition-structured” or “dirty” statistical models: where the model param-eters are a superposition of structurally constrained parameters. We allow for anynumber and types of structures, and any statistical model. We consider the gen-eral class of M -estimators that minimize the sum of any loss function, a...
متن کاملActiveClean: Interactive Data Cleaning For Statistical Modeling
Analysts often clean dirty data iteratively–cleaning some data, executing the analysis, and then cleaning more data based on the results. We explore the iterative cleaning process in the context of statistical model training, which is an increasingly popular form of data analytics. We propose ActiveClean, which allows for progressive and iterative cleaning in statistical modeling problems while...
متن کاملData Cleaning using Probabilistic Models of Integrity Constraints
In data cleaning, data quality rules provide a valuable tool for enforcing the correct application of semantics on a dataset. Traditional rule discovery techniques assume a reasonably clean dataset, and fail when faced with a dirty one. Enforcement of these rules for error detection is much less effective when mined on dirty data. In the databases literature, a popular and expressive type of lo...
متن کامل